Action selection for stochastic, delayed reward

نویسنده

  • Herbert Jaeger
چکیده

The paper gives a novel account of quick decision making for maximising delayed reward in a stochastic world. The approach rests on observable operator models of stochastic systems, which generalize hidden Markov models. A particular kind of decision situations is outlined, and an algorithm is presented which allows to estimate the probability of future reward with a computational cost of only O(im), where i is the number of action alternatives and m is the model dimension.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Best Action Selection in a Stochastic Environment

We study the problem of selecting the best action from multiple candidates in a stochastic environment. In such a stochastic setting, when taking an action, a player receives a random reward and affords a random cost, which are drawn from two unknown distributions. We target at selecting the best action, the one with the maximum ratio of the expected reward to the expected cost, after exploring...

متن کامل

MODIFIED ACTION VALUE METHOD APPLIED TO ‘n’-ARMED BANDIT PROBLEMS USING REINFORCEMENT LEARNING

Reinforcement Learning (RL) is an area of Artificial Intelligence (AI) concerned with how an agent should take actions in a stochastic environment so as to optimize a cumulative reward signal. This paper investigates a modified approach to action value methods used to solve n-armed bandit problems where one faces repeatedly with a choice among n different options. The selection of the action ma...

متن کامل

Reward-Risk Portfolio Selection and Stochastic Dominance

The portfolio selection problem is traditionally modelled by two different approaches. The first one is based on an axiomatic model of risk-averse preferences, where decision makers are assumed to possess an expected utility function and the portfolio choice consists in maximizing the expected utility over the set of feasible portfolios. The second approach, first proposed by Markowitz (1952), ...

متن کامل

Uncertain Delayed Renewal Reward Process and Its Applications

Uncertain process is a sequence of uncertain variables indexed by time. This paper aims to introduce a kind of uncertain process named uncertain delayed renewal reward process whose interarrival times and rewards (or costs) are regarded as uncertain variables with the first interarrival ( i.e., renewal) time and reward different from the others, respectively. The main results include the uncert...

متن کامل

Second Order Stochastic Dominance , Reward - Risk Portfolio Selection and the CAPM

Starting from the reward-risk model for portfolio selection introduced in De Giorgi (2004), we derive the reward-risk Capital Asset Pricing Model (CAPM) analogously to the classical mean-variance CAPM. The reward-risk portfolio selection arises from an axiomatic definition of reward and risk measures based on few basic principles, including consistency with second order stochastic dominance. Wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999